22 research outputs found
An Almost Constant Lower Bound of the Isoperimetric Coefficient in the KLS Conjecture
We prove an almost constant lower bound of the isoperimetric coefficient in
the KLS conjecture. The lower bound has the dimension dependency .
When the dimension is large enough, our lower bound is tighter than the
previous best bound which has the dimension dependency . Improving
the current best lower bound of the isoperimetric coefficient in the KLS
conjecture has many implications, including improvements of the current best
bounds in Bourgain's slicing conjecture and in the thin-shell conjecture,
better concentration inequalities for Lipschitz functions of log-concave
measures and better mixing time bounds for MCMC sampling algorithms on
log-concave measures.Comment: 25 pages, 1 figure, accepted in GAFA journa
When does Metropolized Hamiltonian Monte Carlo provably outperform Metropolis-adjusted Langevin algorithm?
We analyze the mixing time of Metropolized Hamiltonian Monte Carlo (HMC) with
the leapfrog integrator to sample from a distribution on whose
log-density is smooth, has Lipschitz Hessian in Frobenius norm and satisfies
isoperimetry. We bound the gradient complexity to reach error in
total variation distance from a warm start by and demonstrate the benefit of choosing
the number of leapfrog steps to be larger than 1. To surpass previous analysis
on Metropolis-adjusted Langevin algorithm (MALA) that has
dimension dependency in Wu et
al. (2022), we reveal a key feature in our proof that the joint distribution of
the location and velocity variables of the discretization of the continuous HMC
dynamics stays approximately invariant. This key feature, when shown via
induction over the number of leapfrog steps, enables us to obtain estimates on
moments of various quantities that appear in the acceptance rate control of
Metropolized HMC. Moreover, to deal with another bottleneck on the HMC proposal
distribution overlap control in the literature, we provide a new approach to
upper bound the Kullback-Leibler divergence between push-forwards of the
Gaussian distribution through HMC dynamics initialized at two different points.
Notably, our analysis does not require log-concavity or independence of the
marginals, and only relies on an isoperimetric inequality. To illustrate the
applicability of our result, several examples of natural functions that fall
into our framework are discussed.Comment: 42 page
Fast and Robust Archetypal Analysis for Representation Learning
We revisit a pioneer unsupervised learning technique called archetypal
analysis, which is related to successful data analysis methods such as sparse
coding and non-negative matrix factorization. Since it was proposed, archetypal
analysis did not gain a lot of popularity even though it produces more
interpretable models than other alternatives. Because no efficient
implementation has ever been made publicly available, its application to
important scientific problems may have been severely limited. Our goal is to
bring back into favour archetypal analysis. We propose a fast optimization
scheme using an active-set strategy, and provide an efficient open-source
implementation interfaced with Matlab, R, and Python. Then, we demonstrate the
usefulness of archetypal analysis for computer vision tasks, such as codebook
learning, signal classification, and large image collection visualization
A Simple Proof of the Mixing of Metropolis-Adjusted Langevin Algorithm under Smoothness and Isoperimetry
We study the mixing time of Metropolis-Adjusted Langevin algorithm (MALA) for
sampling a target density on . We assume that the target density
satisfies -isoperimetry and that the operator norm and trace of its
Hessian are bounded by and respectively. Our main result
establishes that, from a warm start, to achieve -total variation
distance to the target density, MALA mixes in
iterations. Notably, this result
holds beyond the log-concave sampling setting and the mixing time depends on
only rather than its upper bound . In the -strongly
logconcave and -log-smooth sampling setting, our bound recovers the previous
minimax mixing bound of MALA~\cite{wu2021minimax}.Comment: 16 page
Domain adaptation under structural causal models
Domain adaptation (DA) arises as an important problem in statistical machine
learning when the source data used to train a model is different from the
target data used to test the model. Recent advances in DA have mainly been
application-driven and have largely relied on the idea of a common subspace for
source and target data. To understand the empirical successes and failures of
DA methods, we propose a theoretical framework via structural causal models
that enables analysis and comparison of the prediction performance of DA
methods. This framework also allows us to itemize the assumptions needed for
the DA methods to have a low target error. Additionally, with insights from our
theory, we propose a new DA method called CIRM that outperforms existing DA
methods when both the covariates and label distributions are perturbed in the
target data. We complement the theoretical analysis with extensive simulations
to show the necessity of the devised assumptions. Reproducible synthetic and
real data experiments are also provided to illustrate the strengths and
weaknesses of DA methods when parts of the assumptions in our theory are
violated.Comment: 80 pages, 22 figures, accepted in JML
Fast MCMC sampling algorithms on polytopes
We propose and analyze two new MCMC sampling algorithms, the Vaidya walk and
the John walk, for generating samples from the uniform distribution over a
polytope. Both random walks are sampling algorithms derived from interior point
methods. The former is based on volumetric-logarithmic barrier introduced by
Vaidya whereas the latter uses John's ellipsoids. We show that the Vaidya walk
mixes in significantly fewer steps than the logarithmic-barrier based Dikin
walk studied in past work. For a polytope in defined by
linear constraints, we show that the mixing time from a warm start is bounded
as , compared to the mixing time
bound for the Dikin walk. The cost of each step of the Vaidya walk is of the
same order as the Dikin walk, and at most twice as large in terms of constant
pre-factors. For the John walk, we prove an
bound on its mixing time and conjecture
that an improved variant of it could achieve a mixing time of
. Additionally, we propose variants
of the Vaidya and John walks that mix in polynomial time from a deterministic
starting point. The speed-up of the Vaidya walk over the Dikin walk are
illustrated in numerical examples.Comment: 86 pages, 9 figures, First two authors contributed equall
Minimax Mixing Time of the Metropolis-Adjusted Langevin Algorithm for Log-Concave Sampling
We study the mixing time of the Metropolis-adjusted Langevin algorithm (MALA)
for sampling from a log-smooth and strongly log-concave distribution. We
establish its optimal minimax mixing time under a warm start. Our main
contribution is two-fold. First, for a -dimensional log-concave density with
condition number , we show that MALA with a warm start mixes in iterations up to logarithmic factors. This improves upon
the previous work on the dependency of either the condition number or
the dimension . Our proof relies on comparing the leapfrog integrator with
the continuous Hamiltonian dynamics, where we establish a new concentration
bound for the acceptance rate. Second, we prove a spectral gap based mixing
time lower bound for reversible MCMC algorithms on general state spaces. We
apply this lower bound result to construct a hard distribution for which MALA
requires at least steps to mix. The lower
bound for MALA matches our upper bound in terms of condition number and
dimension. Finally, numerical experiments are included to validate our
theoretical results.Comment: 63 pages, 2 figure
Recommended from our members
Fast MCMC algorithms, Stability and DeepTune
Drawing samples from a known distribution is a core computational challenge common to many disciplines, with applications in statistics, probability, operations research, and other areas involving stochastic models. In statistics, sampling methods are useful for both estimation and inference, including problems such as estimating expectations of desired quantities, computing probabilities of rare events, gauging volumes of particular sets, exploring posterior distributions and obtaining credible intervals etc.Facing massive high dimensional data, both computational efficiency and good statistical guarantees are more and more important in modern statistical and machine learning applications. In this thesis, centered around sampling algorithms, we consider the fundamental questions on their computational and statistical guarantees: How to design a fast sampling algorithm and how long should it be run? What are the statistical learning guarantee of these algorithms? Are there any trade-offs between computation and learning?To answer these questions, first we start with establishing non-asymptotic convergence guarantees for popular MCMC sampling algorithms in Bayesian literature: Metropolized Random Walk, Metropolis-adjusted Langevin algorithm and Hamiltonian Monte Carlo. To address a number of technical challenges arise enroute, we develop results based on the conductance profile in order to prove quantitative convergence guarantees general continuous state space Markov chains. Second, to confront a large class of constrained sampling problems, we introduce two new algorithms, Vaidya and John walks, to sample from polytope-constrained distributions with convergence guarantees. Third, we prove fundamental trade-off results between statistical learning performance and convergence rate of any iterative learning algorithm, including sample algorithms. The trade-off results allow us to show that a too stable algorithm can not converge too fast, and vice-versa. Finally, to help neuroscientists analyze their massive amount of brain data, we develop DeepTune, a stability-driven visualization and interpretation framework via optimization and sampling for the neural-network-based models of neurons in visual cortex